Distributed Paradigms for Speech and Speaker Recognition
نویسندگان
چکیده
Over recent years, the number of portable devices, like Personal Digital Assistants and cell-phones, has increased dramatically. Thus, there is a growing demand for information management services to be accessible via portable devices that make use of wireless networks to connect to distributed servers. Also, the lack of a keyboard and a mouse in hand-held devices result in slow human-machine interfaces. Automatic Speech Recognition (ASR) is expected to be a key modality in future hand-held devices, resulting not only in much faster interactions but also more natural. However, it is not currently practical to achieve high quality large vocabulary speech recognition performance in a lightweight, portable device. Even as hardware becomes more powerful, the computational requirements of speech recognizers are expected to continue to grow to address the problems of dialect and background noise. Therefore to augment the features of hand-held devices with ASR a more suitable approach would be a client –server architecture [3]. In the client-server approach envisioned here, the low-power hand-held device will perform limited computation operations (such as feature extraction and quantization) and transmit the parameters in a distant server with enough computational resources to carry out the recognition. The client-server architecture has the added advantage that changes made in the server will be transparent to every client. As more transactions are conducted in wireless environments, security will become increasingly important. User authentication by voice provides a good mechanism for adding security without the burden of remembering many different passwords [4],[5],[6],[7]. For this reason there is already a growing commercial demand for speaker verification technology. Hence, our client-server architecture must be suited for both speech and speaker recognition. Since these tasks frequently require slightly different speech analysis methods, this raises the problem of joint coding and transmission of speech parameters. In this thesis we propose to make two main contributions. At the client side we introduce source and channel coding techniques that are optimized for recognition performance and extend these to handle multiple task performance trade-offs. At the server side, we look at mechanisms for increasing robustness of a cluster of servers to loss and violation of a node. This brings up two issues: scalability and security. In Section 2 the issues associated with the client side are described, while in Section 3 issues associated with the server side are presented. Finally, Section 4 provides a short summary of the proposed work.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Session 12: Continuous Speech Recognition And Evaluation II
For the first presentation, Dave Pallett distributed a handout with system descriptions and results. He credited the people involved, and indicated the tight schedule which was met. The tests included three training paradigms: speaker-dependent (SD); longitudinal speaker-dependent (LSD), with much more training speech; and speaker-independent (SI). Tests included 5K and 20K vocabularies, bigram...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کامل